Automated identification of abnormal infant movements from smart phone videos

Cerebral palsy (CP) is the most common cause of physical disability during childhood, occurring at a rate of 2.1 per 1000 live births. Early diagnosis is key to improving functional outcomes for children with CP. The General Movements (GMs) Assessment has high predictive validity for the detection of CP and is routinely used in high-risk infants but only 50% of infants with CP have overt risk factors when they are born. The implementation of CP screening programs represents an important endeavour, but feasibility is limited by access to trained GMs assessors. To facilitate progress towards this goal, we report a deep-learning framework for automating the GMs Assessment. We acquired 503 videos captured by parents and caregivers at home of infants aged between 12- and 18-weeks term-corrected age using a dedicated smartphone app. Using a deep learning algorithm, we automatically labelled and tracked 18 key body points in each video. We designed a custom pipeline to adjust for camera movement and infant size and trained a second machine learning algorithm to predict GMs classification from body point movement. Our automated body point labelling approach achieved human-level accuracy (mean ± SD error of 3.7 ± 5.2% of infant length) compared to gold-standard human annotation. Using body point tracking data, our prediction model achieved a cross-validated area under the curve (mean ± S.D.) of 0.80 ± 0.08 in unseen test data for predicting expert GMs classification with a sensitivity of 76% ± 15% for abnormal GMs and a negative predictive value of 94% ± 3%. This work highlights the potential for automated GMs screening programs to detect abnormal movements in infants as early as three months term-corrected age using digital technologies.


Automated body point labelling from smart phone videos
) from human annotation.

Table B: Labelling accuracy by video resolution
Root mean square difference (RMSD) between manual and automated annotations for training data set (n = 500 frames) by video resolution.CI = confidence interval.

Figure A :
Figure A: Infant labelling.Illustration and definition of infant body point labelling.Informed consent was obtained from the parent/caregiver of the infant used in the image.

Figure B :
Figure B: Labelling accuracy.Difference between manual annotations and automated Deep Lab Cut labelling (top) and inter-rater reliability (bottom) expressed as percentage of infant length.Boxes with horizontal line represent interquartile range and median respectively, error bars 95% confidence interval.Colour dots represent RMSD for each data point (n=50 frames).

Figure C :
Figure C: Quality control, percentage of body points labelled.Boxes with horizontal line represent interquartile range and median respectively, error bars represent 95% confidence interval.Colour dots represent percentage for each data point (n=484 videos).

Figure D :
Figure D: Number of times each video was included in held out test set.

Figure E :
Figure E: Model performance under different hyperparameter settings.Boxplots show median and interquartile range of model performance (AUC) over 25 cross-validation repeats for different choices of batch size, weight regularisation, number of fully-connected dense layers included after convolution, learning rate, number of clips used per video per epoch during training, the inclusion of metadata, use of data augmentation and inclusion of attention modules.

Figure F :
Figure F: Feature saliency for body point position in individuals with good (GMA=0) and poor (GMA=1) outcome.Total feature saliency within each 128-frame clip was averaged across all clips and over all participants in the test set in each cross-validation fold.Average saliency over folds is shown.Size and colour reflect degree of saliency for each point.

Figure G :
Figure G: Average feature saliency across all cross-validation folds.Total feature saliency for body points averaged over clips and participants for each of the 25 cross-validation folds.

Figure H :
Figure H: Number of high saliency clips for all participants with normal and abnormal GM prediction.In each cross-validation fold, the number of clips with high total saliency (>90 th percentile) were counted for each participant in the test set.Over all folds, the distribution of high saliency clips counts are shown with mean  S.D. for participants with normal or abnormal GM assessment.

Figure J :
Figure J: GM prediction, birth cohort and 2-year outcomes density functions.Top: Density function of Bayley-III domain scores for motor, cognitive and language domains stratified by GM prediction (blue GM=0, normal; orange GM=1, abnormal) and birth cohort (preterm solid line, term dashed line).Bottom: Peak of density function.

Table A :
Percentage of points from DLC model within percentage of infant length (crown to mid hip, see supplementary Figure

Table C : Factors affecting DLC model performance. Linear
mixed effect model results (n=403 videos).df= degrees of freedom.

Table D : GMs prediction, model variants and 2-year outcomes, two-sample t-test results.
GM=0, normal GM prediction, GM=1, abnormal GM prediction.Metadata: birth = birth cohort (preterm/term), age = age at video acquisition, both = birth and age, none=no metadata, only movement data.2-year outcomes assessed using Bayley-III scales.

Table E :
Participant demographics.presented as count and percentage of participants that were female.Gestation and weight are present as mean (SD= standard deviation).